Gene expression analysis with the parametric bootstrap.

نویسندگان

  • M J van der Laan
  • J Bryan
چکیده

Recent developments in microarray technology make it possible to capture the gene expression profiles for thousands of genes at once. With this data researchers are tackling problems ranging from the identification of 'cancer genes' to the formidable task of adding functional annotations to our rapidly growing gene databases. Specific research questions suggest patterns of gene expression that are interesting and informative: for instance, genes with large variance or groups of genes that are highly correlated. Cluster analysis and related techniques are proving to be very useful. However, such exploratory methods alone do not provide the opportunity to engage in statistical inference. Given the high dimensionality (thousands of genes) and small sample sizes (often <30) encountered in these datasets, an honest assessment of sampling variability is crucial and can prevent the over-interpretation of spurious results. We describe a statistical framework that encompasses many of the analytical goals in gene expression analysis; our framework is completely compatible with many of the current approaches and, in fact, can increase their utility. We propose the use of a deterministic rule, applied to the parameters of the gene expression distribution, to select a target subset of genes that are of biological interest. In addition to subset membership, the target subset can include information about relationships between genes, such as clustering. This target subset presents an interesting parameter that we can estimate by applying the rule to the sample statistics of microarray data. The parametric bootstrap, based on a multivariate normal model, is used to estimate the distribution of these estimated subsets and relevant summary measures of this sampling distribution are proposed. We focus on rules that operate on the mean and covariance. Using Bernstein's Inequality, we obtain consistency of the subset estimates, under the assumption that the sample size converges faster to infinity than the logarithm of the number of genes. We also provide a conservative sample size formula guaranteeing that the sample mean and sample covariance matrix are uniformly within a distance epsilon > 0 of the population mean and covariance. The practical performance of the method using a cluster-based subset rule is illustrated with a simulation study. The method is illustrated with an analysis of a publicly available leukemia data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Gene Expression Programming (GEP) and Parametric and Non-parametric Regression Methods in the Prediction of the Mean Daily Discharge of Karun River (A case Study: Mollasani Hydrometric Station)

Nowadays, the prediction of river discharge is one of the important issues in hydrology and water resources; the results of daily river discharge pattern could be used in the management of water resources and hydraulic structures and flood prediction. In this research, Gene Expression Programming (GEP), parametric Linear Regression (LR), parametric Nonlinear Regression (NLR) and non-parametric ...

متن کامل

Comparison of the Alterations of Gene Expression Related to Signaling Pathways of Synthesis and Degradation of Skeletal Muscle Protein Induced by Two Exercise Training Protocols

Background and Objectives: Skeletal muscle mass depends on the balance between synthesis and degradation of muscle protein, which changes with aging and disease. The aim of the present reserch was to examine the effects of two exercise training protocols on alterations of some genes involved in pathways of protein synthesis and degradation in order to achieve a more effective training program i...

متن کامل

A Bootstrap Interval Robust Data Envelopment Analysis for Estimate Efficiency and Ranking Hospitals

Data envelopment analysis (DEA) is one of non-parametric methods for evaluating efficiency of each unit. Limited resources in healthcare economy is the main reason in measuring efficiency of hospitals. In this study, a bootstrap interval data envelopment analysis (BIRDEA) is proposed for measuring the efficiency of hospitals affiliated with the Hamedan University of Medical Sciences. The propos...

متن کامل

Comparing two testing procedures in unbalanced two-way ANOVA models under heteroscedasticity‎: Approximate degree of freedom and parametric bootstrap approach

‎The classic F-test is usually used for testing the effects of factors in homoscedastic two-way ANOVA models‎. ‎However‎, ‎the assumption of equal cell variances is usually violated in practice‎. ‎In recent years‎, ‎several test procedures have been proposed for testing the effects of factors‎. ‎In this paper‎, ‎the two methods that are approximate degree of freedom (ADF) and parametric bootstr...

متن کامل

Functional-Coefficient Autoregressive Model and its Application for Prediction of the Iranian Heavy Crude Oil Price

Time series and their methods of analysis are important subjects in statistics. Most of time series have a linear behavior and can be modelled by linear ARIMA models. However, some of realized time series have a nonlinear behavior and for modelling them one needs nonlinear models. For this, many good parametric nonlinear models such as bilinear model, exponential autoregressive model, threshold...

متن کامل

Comparing the effects of endurance and resistance trainings on gene expression involved in protein synthesis and degradation signaling pathways of Wistar rat soleus muscle

Background: Skeletal muscle mass, which is regulated by a balance between muscle protein synthesis and degradation, is an important factor for movement to meet everyday needs, especially in pathological conditions and aging. The purpose of the present investigation was to compare the alterations of the gene expression involved in muscle protein synthesis and degradation signaling pathways induc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biostatistics

دوره 2 4  شماره 

صفحات  -

تاریخ انتشار 2001